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Abstract 

Background: Autogenous c/s-regulators of ribosomal protein synthesis play a critical role in maintaining the 
stoichiometry of ribosome components. Structured portions within an mRNA transcript typically interact with 
specific ribosomal proteins to prevent expression of the entire operon, thus balancing levels of ribosomal proteins 
across transcriptional units. Three distinct RNA structures from different bacterial phyla have demonstrated 
interactions with SI 5 to regulate gene expression; however, these RNAs are distributed across a small fraction of 
bacterial diversity. 

Results: We used comparative genomics in combination with analysis of existing transcriptomic data to identify 
three novel putative RNA structures associated with the SI 5 coding region in microbial genomes. These structures 
are completely distinct from those previously published and encompass potential regulatory regions including 
ribosome-binding sites. To validate the biological relevance of our findings, we demonstrate that an example of the 
Alphaproteobacterial RNA from Rhizobium radiobacter specifically interacts with S15 in vitro, and allows in vivo 
regulation of gene expression in an £ coli reporter system. In addition, structural probing and nuclease protection 
assays confirm the predicted secondary structure and indicate nucleotides required for protein interaction. 

Conclusions: This work illustrates the importance of integrating comparative genomic and transcriptomic 
approaches during de novo ncRNA identification and reveals a diversity of distinct natural RNA regulators that 
support analogous biological functions. Furthermore, this work indicates that many additional uncharacterized RNA 
regulators likely exist within bacterial genomes and that the plasticity of RNA structure allows unique, and likely 
independently derived, solutions to the same biological problem. 



Background 

Over the last two decades numerous RNA regulators have 
been discovered that range in function from directing 
development in eukaryotes [1], to controlling bacterial 
virulence [2] and metabolism [3]. These RNA regulators 
vary considerably in their sizes (from less than 20 to 
greater than 200 nucleotides), required processing pro- 
teins (e.g. Argonaut, Dicer, RNase P), and mechanisms 
of action (inhibition of transcription or translation; 
utilization of Watson-Crick base-pairing to recognize 
the transcript of interest, or the formation of complex 
tertiary structure to enable specific protein-binding or 
metabolite-sensing) [4-7]. This extensive variability suggests 
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that different regulatory RNAs have very diverse evolution- 
ary origins. 

The independent evolution of RNA regulators is espe- 
cially apparent in bacteria due to the large number of se- 
quenced genomes, and the breadth of biological diversity 
these genomes represent. Several metabolic processes are 
controlled by completely distinct RNA regulatory mecha- 
nisms in different bacterial species. These include methio- 
nine biosynthesis, which is regulated by at least three 
completely distinct 5-adenosyl methionine (SAM) -binding 
riboswitch architectures [8-11], and glucosamine metabol- 
ism where both a glucosamine-6-phosphate responsive 
ribozyme [12] and a series of small RNAs [13] regulate the 
same pathway in different bacterial species. 

The independent derivation of RNA regulators is not 
restricted to the control of metabolic pathways. In E. 
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coli, the over fifty genes encoding ribosomal proteins are 
localized to approximately twenty transcriptional units, 
and the stoichiometry of ribosomal proteins is partially 
maintained through negative autogenous regulation. In 
this paradigm, excess levels of a given ribosomal protein 
that are not rapidly assembled into the ribosome, bind 
a regulatory RNA structure (often located in the 5'- un- 
translated region) to prevent further transcription or 
translation of the ribosomal protein effector and other 
proteins encoded by the operon [14,15]. Discovered over 
40 years ago [16], autogenous regulation of ribosomal 
protein biosynthesis has been well-characterized in E. 
coli [15]. While several of these RNA regulators appear 
to be conserved throughout many bacterial species, in 
most cases these protein-binding sites are narrowly dis- 
tributed to a few orders of Gammaproteobacteria [17]. 
In addition there are also many cases where alternative 
RNA structures that interact with homologous riboso- 
mal proteins are present in different bacterial phyla [18]. 
Examples include RNAs that interact with ribosomal 
proteins S4, L20, and S15 [14,19-24]. The widespread 
nature of RNA-based autogenous regulation as a mech- 
anism for the control of ribosomal protein synthesis 
coupled with the narrow distribution of most RNA regula- 
tors, and the existence of alternative RNA structures in a 
few cases, strongly suggests that many similar such mech- 
anisms remain to be discovered in other bacterial phyla. 

The RNAs that interact with ribosomal protein S15 
are representative of what is likely a common phenomenon. 
To date, three different RNA structures that interact with 



ribosomal protein S15 have been identified in Escherichia 
coli, Thermus thermophilus, and Geobacillus stearothermo- 
philus (formerly Bacillus stearothermophilus). Each RNA 
structure appears to have distinct binding determinants, 
and they bear little resemblance to the rRNA binding-site 
for S15 [25,26]. Yet, each allows negative regulation of 
rpsO, the gene encoding S15. Despite their shared func- 
tion, the RNA structures show no obvious sequence or 
structural similarity (Figure 1). While this collection of 
RNA regulators already highlights RNA structural diver- 
sity, examination of their phylogenetic distributions indi- 
cates that most bacterial phyla have no previously described 
S15 regulation [17,18]. This suggests that there are many 
more RNAs remaining to be described. 

In this work, we implement a framework for computa- 
tional identification of structured RNA in bacterial ge- 
nomes and apply it to genomic regions proximal to the 
S15 coding region (rpsO) to assess the diversity of S15- 
interacting RNAs in bacteria. Our search resulted in many 
putative structured RNAs across different phyla of bacteria. 
Sequence alignments corresponding to several of these pu- 
tative RNA structures were further examined to determine 
phylogenetic distributions and identify transcription start 
sites from available RNA-seq data. Additionally, to estab- 
lish the biological relevance of our results, we experimen- 
tally demonstrate that one of these RNAs, originating from 
the alphaproteobacterium Rhizobium radiobacter (also 
called Agrobacterium tumefaciens), has the expected bio- 
logical function. We validate specific interactions between 
the predicted RNA structure (Rra-RNA) and the S15 
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Figure 1 Previously identified ribosomal protein S15-interacting RNA leader sequences and structures originating from different phyla 
of bacteria. Three regulatory RNA structures have been previously reported for ribosomal protein S15. A: Escherichia coli [22], B: Thermus 
thermophilus [24], and C: Geobacillus stearothermophilus [23]. In each structure the rpsO start codon is boxed, and a red bar is placed over the 
ribosome-binding site. 
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protein from the same species (Rra-S15) using in vitro 
binding assays, and pinpoint regions of the RNA important 
for protein-interaction using mutagenesis and truncation. 
The RNA's secondary structure is further confirmed using 
structural probing assays. Finally, we also demonstrate that 
the Rra-RNA regulates gene expression in response to Rra- 
S15 using an E. coli surrogate reporter system. 

Results and discussion 

Comparative genomics identifies several putative RNA 
structures associated with rpsO 

To identify putative RNA structures associated with 
the coding region for ribosomal protein S15 {rpsO), we 
implemented a computational pipeline, GAISR (Genomic 
Analysis for Illuminating Structured RNA, Figure 2) for de 
novo ncRNA discovery and candidate refinement. GAISR 
is based on existing RNA discovery pipelines [27] that 
have been very successful at identification of ncRNA can- 
didates [28,29]. GAISR utilizes several pre-existing tools, 
including CMfinder, a de novo ncRNA discovery tool [30], 
and Infernal 1.1, an RNA homology search tool [31] to 
streamline sequence selection, identify potential ncRNAs, 
and efficiently detect additional homologues for putative 
RNA structures. We used GAISR to examine the genomic 
region corresponding to the 5 -untranslated region of the 
gene encoding S15, rpsO, in fully sequenced bacterial ge- 
nomes. From the initial search we identified 52 potential 
ncRNA sequences, originating from 16 initial phylogenetic 
sequence clusters. 

From these initial sequences, we identified five promis- 
ing RNA structures. Among these structures were the 
two known RNAs that allow regulation of rpsO in Firmi- 
cutes and Gammaproteobacteria [17,18]. Of note, the 



RNA structure reported for Thermus thermophilus was 
not identified by our search, suggesting that more RNAs 
may be present that were not uncovered here. There are 
several potential reasons for this result including biases 
in sequence coverage (there were only 19 sequences de- 
rived from Deinococcus/Thermus available for analysis), 
and our use of a single RNA discovery tool for identifi- 
cation of RNA structures may limit our ability to identify 
putative RNA structures. No tool for RNA de novo dis- 
covery is designed to identify potential pseudoknotted 
structures, yet these are very common in biologically 
functional RNAs [32]. The pseudoknotted structures we 
have identified (e.g. from Gammaproteobacteria) are typ- 
ically identified as individual helices by CMfinder and 
manually merged during the curation process. 

Alignments corresponding to the three promising novel 
structures were curated and additional examples identified 
using Infernal homology searches. In addition, the phylo- 
genetic distribution of each putative ncRNA was exam- 
ined, and each alignment was compared with existing 
RNA-seq data to identify regions likely to be within the 
rpsO transcript. Consensus diagrams of the three candi- 
date ncRNAs are shown in Figure 3 (A-C) and the align- 
ments that correspond to these structures may be found 
as Additional files 1, 2 and 3. RNA-secondary structures 
determined from analysis of large phylogenies are often 
well defined by co-varying nucleotide positions. However, 
individual sequences corresponding to the RNA structures 
we identified contain extensive variability including many 
non-canonical base-pairs and variable-length regions out- 
side of the very well-conserved regions that are likely dir- 
ectly involved in protein-binding. Thus the secondary 
structure predictions in Figure 3 should be considered 
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Figure 2 Overview of comparative genomic pipeline: Genomic Analysis for Illuminating Structured RNA (GAISR). Putative 5'-untranslated 
regions (S'-UTRs) of our gene of interest (rpsO) are identified within completed microbial genomes and clustered by their taxonomic group. 
CMfinder is used to identify potential conserved RNA structures within these sequence clusters [30]. Following RNA structure identification, the 
candidate RNA structures are manually inspected and additional homologs are identified using Infernal 1.1 [31]. The genomic context of putative 
homologs is assessed, and they are incorporated into the alignment using cmaiign. The alignment is then typically manually inspected to identify 
potential pseudoknots or other regulatory features and the curation process may be repeated several times. Finally, transcriptomic data is sought 
to identify the transcription start site of the putative ncRNA. 
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Figure 3 Consensus diagrams of novel putative RNA structures and individual examples used for transcriptomic analysis. Novel 
regulatory RNAs we have identified upstream of the rpsO operon. HO helices were originally predicted by comparative genomics, but not 
supported by transcriptomic analysis and therefore are unlikely to be biologically relevant. A: RNA originating from Alphaproteobacteria, B: RNA 
originating from Actinobacteria, and C: RNA originating from Chlamydia. D: Alphaproteobacterial RNA example originating from Rhodobacter 
spaeroides (NC_01 1963.1) showing putative transcription start site determined from analysis of RNA-seq reads (Additional file 4: Figure S1A) [34]. 
E: Actinobacterial RNA example originating from Mycobacterium tuberculosis (NC_000962.3) showing putative transcription start site determined 
from analysis of RNA-seq reads (Additional file 4: Figure S1B). F: Chlamydia RNA example originating from Chlamydia trachomatis with previously 
determined transcription start site (NC_010280.1/275170) [37]. 



tentative. However, the degree of conservation observed 
here is consistent with that observed for other ribosomal 
protein-interacting regulatory RNAs that have been experi- 
mentally validated in the past [17,18]. Therefore despite 
the sequence and structure variability, we believe that the 
RNAs we identified are likely to have a regulatory function. 

RNAs identified are diverse in sequence and secondary 
structure 

Our first RNA (Figure 3A) was identified in greater than 
90% of species within the Alphaproteobacteria orders of 



Rhizobiales, Rhodobacterales, Rhodospirillales, Caulobac- 
terales, and Sphingomonadales. However, only a single ex- 
ample of the RNA was found in a Rickettsiales species 
(from 58 genomes explored), potentially reflective of gen- 
ome reduction in most Rickettsiales species [33]. Our ori- 
ginal putative RNA structure included three predicted 
pairing elements (H0-H2). In -50% of examples there 
is also a long-linker region between HI and H2 (up to 
400 nt) that is typically base-paired, although the precise 
position of this base-pairing within the sequence does not 
appear to be well-conserved (see Additional file 1 for 
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alignment). The most highly conserved portion of the 
putative RNA is the HI helix. This helix shows exten- 
sive evidence of co-variation and the loop region is 
highly conserved suggesting that it is important for pro- 
tein binding. The H2 helix is less conserved, but typic- 
ally encompasses a putative ribosome-binding site in 
the 3 ' portion. While HO shows some co-variation, the 
loop region is not well conserved in sequence or length. 
In addition, transcriptomic analysis of RNA-seq data de- 
rived from Rhodobacter spaeroides [34] (Additional file 4: 
Figure SI A) suggests that the 5' portion of this pairing 
element is not transcribed (Figure 3D), thus we believe 
that the originally predicted HO pairing element is likely 
not part of the biologically relevant RNA. 

Our second RNA (Figure 3B) was identified mainly in 
the Actinomycetales order of Actinobacteria. The puta- 
tive RNA structure contains a kissing-loop pseudo- 
knotted structure that bears faint resemblance to the 
RNA structure originating from E. coli (Figure 1A), and 
there are weakly scoring homologs that appear in vari- 
ous Gammaproteobacteria (e.g. Pseudomonas) lacking 
the known E. coli S15 regulator [17]. However, the clos- 
ing pseudoknot occurs prior to any potential regulatory 
sequences suggesting that the "entrapment" mechanism 
proposed for the E. coli RNA is not likely to play a role 
here [35,36]. Like the RNA described above, a ribosome- 
binding site is apparent in the 3 ' portion of the H2 helix, 
suggesting a potential translational regulatory mechanism 
(see Additional file 2 for alignment). Analysis of RNA-seq 
data from Mycobacterium tuberculosis (Additional file 4: 
Figure SIB), suggests that the transcription start site for 
this RNA is approximately 10 nucleotides upstream from 
the start of the first predicted pairing element (Figure 3D). 

Our third RNA originates from Chlamydia, and is the 
one in which we have the least confidence, mainly due 
to the limited sequence diversity available for analysis 
(Figure 3E, see Additional file 3 for alignment). However, 
there is a very strongly conserved hairpin overlapping start 
codon of rpsO in approximately 30 sequenced strains of 
Chlamydia and a second potential short pairing element 
displaying some covariation and compatible mutations. In 
our original prediction, this hairpin was significandy ex- 
tended (HO). However, pre-existing analysis of transcript 
start sites in Chlamydia trachomatis indicates that the 
transcript start site is just upstream of HI (Figure 3F) [37]. 
Therefore we believe that HO is likely not part of the bio- 
logically relevant RNA. Notably, very few regulatory RNAs 
have been identified in Chlamydia. Only examples of the 
TPP and cobalamin riboswitches have been identified in 
this class of bacteria [38], and in these cases there appear 
to be only isolated sequences rather than elements that 
are conserved in many genomes. 

The process of curating our original alignments, and 
in particular the incorporation of RNA-seq data, was 



critical for narrowing our focus to the portions of the 
predicted RNAs that are most likely to be biologically 
relevant. In two cases, transcriptomic data allowed us to 
determine that putative hairpins predicted through com- 
parative genomics are unlikely to be part of the transcript. 
Our analysis exemplifies that in assessing the biological 
relevance of a given ncRNA candidate it is important to 
determine whether a putative RNA is actually transcribed 
as well as identify the transcription start site of the RNA 
candidate [39]. Thus, archives that consolidate RNA-seq 
data, and provide easily accessible read-depth information 
for many bacterial species are of great importance moving 
forward in RNA comparative genomics. 

RNA from alphaproteobacterium Rhizobium radiobacter 
specifically interacts with S15 protein 

To experimentally validate the biological relevance of our 
results, we further examined an example of the Alphapro- 
teobacterial RNA originating from Rhizobium radiobacter 
(NC_003062, organism also known as Agrobacterium fab- 
rum strain C58, and formerly known as Agrobacterium 
tumefaciens strain C58). The sequence from R. radiobac- 
ter conforms well to our consensus structure, containing 
the highly conserved HI, and the predicted H2 pairing 
element. In addition, this sequence is one where the re- 
gion directly preceding our transcription start site has the 
potential to base-pair with the 5 '-most portion of the 
RNA. We designated this helix HO due to its position 5' of 
the predicted transcription start site. We first tested the 
full-length version of the RNA (nucleotides -108 to +27) 
and called it Rra-RNAl because it was the first RNA 
tested from this organism (Figure 4A). To examine 
whether this RNA interacts specifically with S15 protein 
from the same organism (Rra-S15) we utilized filter- 
binding assays [40]. These assays confirmed that Rra- 
RNAl binds Rra-S15 with nanomolar affinity (Figure 4A, 
B; Kd = 22.2 ± 0.7 nM). This value is similar to those re- 
ported for the interactions between S15 and the RNA 
structures originating from E. coli (45 nM) [35], G. 
stearothermophilus (20 nM) [23], and the T. thermophilus 
(5nM) [24]. 

Truncation analysis suggests that Rra-S15 minimal bind- 
ing site includes both HI and H2 

To experimentally investigate the validity of the putative 
transcription start site we constructed several 5' trunca- 
tions to the Rra-RNAl sequence and tested their ability 
to bind Rra-S15. Based on the putative transcription 
start site derived from analysis of RNA-seq data from 
Rhizobium spaeroides (at C-95 according to our align- 
ment), the potential HO helix predicted from compara- 
tive genomics in the absence of RNA-seq analysis 
(Figure 3A,D) is unlikely to be necessary for Rra-S15 
binding. We performed 5 -RACE for this RNA to further 
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Figure 4 In vitro nitrocellulose filter-binding assays confirm the transcription start site as well as indicate HI as the region essential for 
Rra-S15 binding. A: Truncation sites and specific mutations to the Rra-RNA. The start codon AUG is boxed, and a red bar is over the ribosome-binding 
site. B: 5' truncations; C: 3' truncations; D: Putative binding-site mutations. Each curve represents at least three independent replicates. For the purposes 
of comparison, the data from Rra-RNA1 was repeated on graphs B & D. Reported Kq measurements represent the protein concentration at which half 
of the maximum percentage of Rra-RNA is protein bound. Max% refers to the maximum percentage of Rra-RNA that interacts with Rra-S1 5 in this 
in vitro assay (see Methods for calculations). 



identify the transcription start site. Although C-95 was 
one of the 5 '-ends identified (Additional file 4: Figure S2), 
this experiment provided multiple 5 -ends and was ultim- 
ately inconclusive. Truncations Rra-RNA2 (nucleotides -91 
to +10) and Rra-RNA3 (nucleotides -79 to +10) appear to 
have negligible effects on Rra-S15 binding (K D =14.5 ± 6.1 
nM and 21 ± 4.8 nM, respectively) (Figure 4A,B). These 
results indicate that all bases upstream of nucleotide -79 
are not required for binding Rra-S15, consistent with the 
putative transcription start site prior to this nucleotide at 
C-95. Binding was not significandy affected until the RNA 
was truncated to G-72, Rra-RNA4 (K D =125 ± 106.5 nM) 
(Figure 4A,B). Collectively, these results suggest the entire 
HO stem and loop are dispensable and the C-95 identified 
during analysis of RNA-seq data from R. sphaeroides likely 
represents the transcription start site. 

To identify the minimal protein binding-site, we exam- 
ined 3 '-truncations to the Rra-RNA (Figure 4A,C). In the 
gammaproteobacterial RNA, the initial amino acid encod- 
ing nucleotides of rpsO form an integral part of the RNA 



structure and function [36]. However, removing the cod- 
ing region of the alphaproteobacterial RNA (Rra-RNA5, 
nucleotides -108 to +5) has minimal effect on the binding 
affinity (K D =16.6 ± 10.8 nM). Rra-RNA6 (nucleotides -108 
to -6) was designed to remove all bases downstream of the 
predicted H2; again, this RNA binds Rra-S15 with an af- 
finity better than that of Rra-RNAl (K D = 11.9 ± 1.8 nM). 
The observed increase in binding affinity is likely due to 
removal of potential alternative competing structures, thus 
allowing a tighter interaction between the protein and the 
RNA. Rra-RNA7 (nucleotides -108 to -31) was designed 
to remove all of predicted hairpin H2 including the five 
uracils (U-26 to U-30) through the putative ribosome 
binding site (purine-rich sequence from A-8 to A-13), 
start codon and subsequent protein coding nucleotides. 
This truncation completely abolishes Rra-S15 binding 
(K D >500). To assess whether slippage along the predicted 
H2 might allow the five uracils (U-26 to U-30) to base- 
pair with the putative ribosome binding site (A-8 to 
A-13), we mutated the polyuridine to a purine-rich 
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sequence to destabilize this alternative pairing (Rra- 
RNA8, Figure 4A). This mutant was able to bind Rra- 
S15 with a similar affinity to the full length Rra-RNAl 
(K D =12.5 ±2.9 nM) suggesting that the pairing we 
have drawn is one that allows for protein binding. 
Based on this data we predict the minimal RNA regula- 
tory region includes nucleotides G-79 through U-6, 
which is fully encompassed by our predicted transcript, 
and includes both of the predicted pairing elements HI 
and H2. 



Mutation analysis suggests a potential binding-site 

The most highly conserved portion of this RNA is 
the HI stem loop (Figure 3A). Due to its high sequence 
conservation, we hypothesize that this region is essential 
for Rra-S15 binding. Mutations made in this stem, Rra- 
RNA9 and Rra-RNAIO, both significantly inhibit binding 
(Figure 4A,D; K D values of >500 and 221 ± 52.3 nM, 
respectively). The compensatory mutation, Rra-RNAl 1, 
was able to partially recover Rra-S15 binding (K D =114 ± 
37 nM). In this compensatory mutant, it is likely an alter- 
native base-pair forms with usually unpaired A-41 and the 
dynamic equilibrium of the two RNA structures allows, 
but does not completely restore, Rra-S15 binding. In com- 
bination with the truncation experiments above, these re- 
sults suggest that Rra-S15 binds its RNA regulator in the 
highly conserved stem-loop structure of HI but that H2 is 
still required for binding. 

The R. radiobacter RNA allows regulation in response to 
SI 5 in vivo 

To determine whether the Rra-RNA has regulatory activ- 
ity in addition to S15-binding activity, we conducted 
in vivo reporter assays to assess regulation. To do this we 
used a GFP reporter to measure expression of the gene 
following the Rra-RNA in response to different levels of 
Rra-S15. The RNA sequence was cloned in-frame as a 
translational fusion with the GFP reporter under the con- 
trol of the IPTG-inducible trc promoter. This construct 
included the rpsO start codon, Shine-Dalgarno sequence, 
and the first nine codons of the rpsO gene to form the 
RNA-GFP fusion. On a second plasmid, the R. radiobacter 
rpsO coding sequence was placed under the control of an 
L-arabinose inducible promoter. The pair of plasmids 
were co-transformed into an E. coli K12:ArpsO strain. We 
chose to use a surrogate organism, E. coli, due to its ease 
of use and manipulation, and the ability to obtain a knock- 
out organism lacking endogenous S15. Using this GFP 
in vivo reporter system, we assessed the ability of Rra-S15 
to regulate gene expression by measuring the GFP levels 
in the cells in the presence and absence of induced Rra- 
S15. If the RNA interacts with Rra-S15 to regulate gene 
expression, we expect to see a decrease in GFP expression 



in cells expressing Rra-S15 compared to cells not express- 
ing Rra-S15. 

Cells co-transformed with plasmids containing full 
length Rra-RNAl-GFP, and Rra-S15 were grown in the 
presence and absence of L-arabinose. The cells grown in 
the presence of the sugar (induced Rra-S15) displayed a 
~4-fold decrease in GFP-reporter expression (Figure 5). 
Because L-arabinose induces Rra-S15 production, the 
decrease in GFP reporter expression is likely due to 
an interaction between the RNA and Rra-S15. Next, to 
corroborate that our predicted transcription start site at 
the C-95 allows regulation, the sequence for Rra-RNA3 
(nucleotides -78 to +27) was also tested in this system 
and behaved in a similar manner. These results indicate 
an RNA sequence starting at the transcription start site 
derived from R. spaeroid.es is sufficient to allow regula- 
tion in vivo. 

We also examined whether mutations to HI that abolish 
Rra-S15 binding would affect regulation. Cells containing 
either Rra-RNA9-GFP and Rra-RNAIO-GFP did not dis- 
play a significant difference in GFP reporter expression 
when grown in the presence and absence of L-arabinose. 
This is likely because both Rra-RNA9 and Rra-RNAIO do 
not interact specifically with S15 and are unable to regu- 
late the expression of the reporter. However, it should be 
noted that GFP expression levels in the absence of arabin- 
ose were also significantly lower than those observed with 
the Rra-RNAl and Rra-RNA3 constructs. 

To assess whether the partial compensation of binding 
observed for RraRNAll in vitro represented a biologically 
relevant, functional compensation, Rra-RNAl 1 was also 
examined in this system. In this case, cells grown without 
L-arabinose displayed an increased GFP expression level 
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compared to Rra-RNAl, and cells grown in the presence 
of L-arabinose had a ~10-fold decrease in relative GFP 
fluorescence (Figure 5). However, the increase in fold- 
change is solely due to increased RNA11-GFP expression 
levels and the repressed level of gene expression is com- 
parable between the two RNA elements. Thus, the Rra- 
RNAl 1 compensatory mutation that partially restored the 
in vitro RNA-protein interaction also restored the regula- 
tory interaction between the Rra-RNA and Rra-S15. The 
partial restoration of in vitro binding by the compensatory 
mutant Rra-RNA 11 is likely due to the presence of several 
competing structures formed by the RNA under these 
conditions. However, the in vivo conditions enable the 
RNA to adopt a secondary structure that increases overall 
reporter expression and enables regulation in response to 
S15. Together, these assays indicate that not only the does 
this RNA interact with Rra-S15 in vitro, but it is a bio- 
logically relevant regulatory element responding to S15. 

Structural probing confirms predicted secondary 
structure 

To further examine the secondary structure of Rra-RNA 
in the absence of protein we used several structural 
probing methods in combination with a minimal RNA 
construct (Rra-RNA6) including nuclease cleavage assays 
with (RNase VI and RNase A), and in-line probing. 
RNase VI cleaves double stranded RNA non-specifically, 
RNase A cleaves single stranded C's and U's, and in-line 



probing the RNA structure reveals the flexible regions of 
the RNA structure (and likely single-stranded regions) 
that are more prone to spontaneous self-cleavage. 

Although the putative stem H2 is predicted in our align- 
ment (Additional file 1), there are many sequences that 
contain short polypyrimidine sequences that are unpaired 
in our sequence alignment. These sequences may form al- 
ternative pairings with the ribosome-binding site (AG rich 
region ~8 nucleotides before the translation start site). 
Based on sequence data alone it is difficult to distinguish 
which bases are interacting with the ribosome-binding 
site. However, several lines of evidence indicate that we 
have identified the correct in vitro base-pairing conform- 
ation for our putative H2 in the R. radiobacter example of 
the RNA (Figure 6). First, our mutagenesis and truncation 
analyses indicate that mutating the polyuridine (U-26 to 
U-30 in Rra-RNA8) does not alter protein-binding activity. 
This suggests that this region is unlikely to interact with 
the putative ribosome-binding site (-13 to -8). However, 
deleting this region and the following hairpin (Rrad- 
RNA7) abolishes protein binding indicating that H2 is im- 
portant for protein binding. Second, RNAse VI cleavage 
occurs symmetrically in regions that are base-paired in 
our figure (-8 through -12 and -23 to -28), and RNaseA 
cleavage occurs at C-17 as would be expected for a loop 
region. In addition, in-line probing shows that the entire 
3' portion of the molecule is somewhat flexible, from 
bases -11 through -22 (region A). In conjunction with 
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our mutagenesis results, this strongly suggests the cor- 
rect pairing-element has been identified. 

Consistent with our mutagenesis results, the highly 
conserved stem HI (bases U-42 to A-65) is almost cer- 
tainly double-stranded with a loop from C-49 to A-56. 
Bases U-42 through C-49 are shielded from in-line at- 
tack and there are strong cleavage bands in RNAse VI 
probing for bases C-43, bases -59 to -61, and G-64. Also, 
there are no RNAse A cleavage products for any of these 
uracils or cytosines, suggesting that these nucleotides 
are not single-stranded. There is also evidence for the 
predicted loop region in HI. Probing with RNAse A re- 
sults in cleavage products for C-53 and C-56 and in-line 
probing reveals that C-53 through A-57 (region D) are 
flexible. At the base of the H2 stem, we predict a bulged 
adenosine (A-41) and the highlighted region C of our in- 
line probing gel corresponds to this bulged base. This data 
corroborates our other evidence that the region essential 
for Rra-S15 binding in HI forms a double-helix. 

The nature of the junction between the two predicted 
helices is still unresolved. This region is not well-conserved 
so there is little phylogenetic evidence of structure, and in 
several cases the different assays give conflicting results, 
which may be the result of multiple folding conformations. 
The string of uridines from U-27 to U-30 does not appear 
to be flexible based on in-line probing, is cleaved by RNA- 
ase VI, and is not cleaved by RNase A, indicating that the 
region is not single-stranded. However, there are also no 
clear binding partners for these nucleotides suggesting that 
they may be forming a constrained tertiary structure. The 
string of cytosines that follows this region, C-32 to C-35, 
do show strong RNase A cleavage suggesting they are 
single-stranded, and this is corroborated by the in-line 
cleavage at these positions (region B). However, these 
bases also display RNAse VI cleavage indicating that 
they may sometimes adopt a double-stranded conform- 
ation. Nucleotides from -35 to -40 are not cleaved by ei- 
ther RNase VI or RNAse A, and appear to be structurally 
constrained. This suggests that they are not necessarily 
double-stranded, but may be participating in some tertiary 
structure. Nucleotides -75 to -80, which potentially could 
interact with these bases, also show conflicting results, 
cleaving with both RNase VI and RNase A. We have in- 
cluded the possible base pairing of the nucleotides at the 
base of HI in our structure figures (Figures 4 and 6), but 
these interactions are likely weak. The accumulation of 
these data strongly suggests that both the predicted 
pairing elements do form, but the nature of the junction 
between these two elements remains unclear. 

Conclusions 

This work demonstrates the premise that nature may 
invent many unique ways to solve a single biological 
problem. In the context of other forms of RNA-based 



regulation the diversity of distinct RNA structures allow- 
ing tis-regulation of the rpsO operon is nearly unmatched. 
The only similar example of such diversity in RNA 
regulators for a specific function are the SAM-binding 
riboswitches, where more than three completely distinct 
classes [8-10], and several additional sub-classes with re- 
arranged or modified secondary structure elements have 
been characterized [41,42]. The S15 auto-regulatory RNA 
structures we identified are quite diverse from one an- 
other, and from the existing known characterized S15 
regulatory RNAs that originate from E. coli, G. stearopther- 
mophilus, and T. thermophilus [22-24]. 

All of the previously characterized RNA structures en- 
compass a predicted Shine-Dalgarno sequence, but be- 
yond such regulatory features the RNAs appear to share 
very few common sequence features or patterns in sec- 
ondary structure. While the S15-interacting RNAs poten- 
tially share some tertiary structure similarities that are not 
captured in the secondary structure diagrams, previous 
studies indicate that the E. coli S15 does not interact with 
the regulatory RNA originating from G. stearothermophi- 
lus [26] . This finding suggests that there may be no single 
conserved tertiary structure shared by the S15-binding 
mRNA structures. In the absence of structural data, it re- 
mains to be seen whether the structural diversity apparent 
in natural S15-interacting mRNA structures is a result of 
RNAs inherent ability to generate a similar tertiary struc- 
ture from diverse arrangements of primary and secondary 
structure [21,43], or from differences between the S15 
protein homologs that lead to distinct pools of potential 
RNA ligands. From the structures we describe here, it is 
clear that there are many ways to solve this particular bio- 
logical problem. An exciting question that stems from this 
work is precisely how prevalent are RNA solutions within 
sequence space that allow autogenous regulation in re- 
sponse to S15. Based on the natural diversity of S15- 
interacting RNAs, we expect that this number is large, 
and that as more genomes are sequenced and the sensitiv- 
ity of computational searches increases, additional struc- 
tures with this function will be identified. 

Methods 

Computational identification of putative RNAs and 
curation of RNA alignments 

RpsO was identified in the genomes of fully sequenced 
bacteria (refseq58-microbial [44]) using tBLASTn [45]. 
Sequences corresponding to the putative 5' non-coding 
regions (500 nucleotides 5 ' of the translation start, or the 
end of the previous gene) in addition to 25 nucleotides of 
the rpsO coding region were collected. Sequences contain- 
ing >90% sequence identity over >70% of the sequence 
length were removed as redundant. The remaining se- 
quences were clustered based on taxonomy into groups 
of 100 or fewer sequences. CMfinder was run on these 
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clusters with the default parameters [30]. The resulting 
alignments were manually curated to identify the most 
promising RNA candidates. 

Covariance models for each RNA alignment were 
constructed and calibrated using Infernal 1.1 (cmbuild 
-F, cmcalibrate -cpu 4), and homologues were identi- 
fied for each alignment [31]. Cmsearch was performed 
against a custom sequence database described above 
using a lenient e-value cut-off of 1.0 (cmsearch -E 1.0 - 
mid -cpu 4). Sequences were then aligned using cmalign 
(-mapali -cpu 3 -noprob). Alignments were subsequently 
manually adjusted as necessary when sequences with 
variable-length helices and/or loops were added. The 
search process was repeated approximately 3-4 times per 
multiple sequence alignment, to expand sequence diver- 
sity. During the course of these searches, the alignments 
were extended at the 5 ' and 3 ' ends to encompass any po- 
tential flanking sequence and pseudoknotted or alternative 
structures were identified through curation of the align- 
ment. Transcription start sites were identified through 
examination of mapped read-depths derived from RNA-seq 
data [34,46] compiled at AREBA (An RNA Encyclopaedia 
for Bacteria and Archaea (https://github.com/UCanComp- 
Bio/AREBA), or from previously assessed transcription start 
sites in the literature [37]. Consensus secondary structure 
diagrams were created from the alignments using GSC- 
weighting in R2R [47]. 

RNA preparation 

DNA corresponding to the 5'-UTR of the rpsO gene with 
the T7-promoter appended was PCR amplified from R. 
radiobacter genomic DNA. Mutants 8-10 were generated 
through QuickChange mutagenesis on Rra-RNAl tem- 
plate, then PCR amplified using Rra-RNAl primer set. T7 
RNA Polymerase [48] was used to transcribe RNA, and 
RNAs were purified by denaturing PAGE (6%), bands vi- 
sualized using UV shadow, and RNA eluted from excised 
bands in 300 mM NaCl, 1 mM EDTA. Purified RNA 
was 5 '-labeled with 32 P-ATP [49] and again purified as 
described above. 

Protein preparation 

The R. radiobacter rpsO ORF was cloned into pET-HT 
overexpression vector [50] and transformed into BL-21 
(DE3) cells (Invitrogen). Protein was over-expressed and 
cells lysed by sonication using S15 Resuspension Buffer 
(100 mM Tris/HCl, pH 8.0, 800 mM NaCl, 150 mM 
MgCl 2 ). S15 was soluble and was purified at 4°C using 
non-denaturing FPLC cation exchange chromatography, 
pH 5.5, with a linear salt gradient (100 mM-1 M NaCl) 
[51]. A second purification was performed under conditions 
previously described [52] using pH 8.0 and a linear salt 
gradient (20 mM - 1 M KC1) at 4°C by non-denaturing 
FPLC cation exchange chromatography. RNAse-free protein 



fractions were concentrated, analyzed via SDS-PAGE, and 
buffer exchanged for the S15 Storage Buffer (50 mM Tris/ 
Acetate, pH 7.5, 20 mM Mg-Acetate, 270 mM KC1), Final 
protein concentration was determined by Bradford assay 
and stored at 4°C. 

Filter-binding assays 

RNA binding capability was examined by filter-binding 
assay (FBA). A fixed amount of 5 -labeled RNA 
(1000 cpm, <1 nM) was renatured for 15 minutes at 42°C, 
then incubated with serial dilutions of S15 in Buffer A 
(50 mM-Tris/Acetate, pH 7.5, 20 mM Mg-Acetate, 
270 mM KC1, 5 mM dithiothreitol, 0.02% bovine serum al- 
bumin), for 30 minutes at 25°C. Nitrocellulose membrane 
(GE Healthcare) was used to collect RNA-S15 and nylon 
(GE Healthcare) to collect unbound RNA under suction. 
Membranes were air-dried for 5 minutes and the fraction 
bound quantified by imaging membranes on a phosphori- 
mager screen. Radioactivity counts per sample on each 
membrane were measured using GE Healthcare STORM 
820 phosphorimager and ImageQuant. For each sample, 
the fraction bound (Fb) corresponds to the (counts nitro- 
cellulose) /(counts nitrocellulose + counts nylon). To deter- 
mine the K D and the maximum fraction bound (Max%), 
the resulting values were fit to the equation: Fb = (Max%* 
[S15])/([S15] + K D ) where [S15] corresponds to the con- 
centration of S15 in the reaction. The residuals were mini- 
mized using the Solver function in Microsoft Excel to find 
both the Max% and the I<d- Kd values given in the text 
represent the mean of 3 or more independent binding as- 
says ± the standard deviation. 

Structural and nuclease probing assays 

The RNA-protein binding reaction described above was 
used for RNAse probing assays. After incubation, 1 uL 
RNAse A (1 ug/mL, Ambion) or RNAse VI (1:400 dilution 
of 0.1 U/uL, Ambion) was added and the reaction incu- 
bated for 15 minutes at 25°C. The nuclease was inacti- 
vated with inactivation/precipitation buffer (Life Sciences) 
and RNA fragments recovered by ethanol precipitation. 
Precipitated RNAs were suspended in 10 uL Urea Loading 
solution (Life Sciences) and incubated for 5 minutes at 
95°C. Five uL of each reaction was loaded on 10% denatur- 
ing Acrylamide/Bis-acrylamide gel. The gel was dried and 
examined using a GE Healthcare STORM 820 phosphori- 
mager and ImageQuant software. Partial hydroxyl cleavage 
reactions were generated by incubating RNA in Reaction 
Buffer (50 mM Na 2 C0 3 pH 9.0, 1 mM EDTA) at 95°C for 
7 minutes. Denaturing Tl reaction was conducted accord- 
ing to manufacture's protocol (Ambion). For in-line prob- 
ing, 5 -labeled RNA was incubated for 40 hours at 25°C in 
Reaction Buffer (20 mM MgCl 2 , 100 mM KC1, 50 mM tris 
pH 8.3). The reaction was stopped using Urea loading so- 
lution (10 M Urea, 1.5 mM EDTA). 
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Plasmid construction 

The ptrc-RNA-GFP plasmid was constructed from pLac- 
thiMwt-tetA-gfpuv plasmid [53] as outlined in detail in 
Additional file 4. Essentially, the RNA sequence and the 
first nine codons were placed as a translational fusion with 
GFP replacing the thiamine responsive riboswitch and 
existing ribosome-binding site. The pre-existing lac pro- 
moter was replaced with a trc promoter. The protein ex- 
pression plasmid was constructed by amplifying the DNA 
fragment encoding the rpsO gene containing a SacI re- 
striction site and a ribosome-binding site on the 5' 
terminus (5 ' -caagagctcaggaggttttaaaatgtcgattactgcagagcgc 
aaag) and Xbal site on the 3' terminus (5 - caatctagatta 
gcggcgaatgccgagagc) from genomic DNA extracted from 
R. radiobacter (ATCC 23308). The PCR product was 
digested with SacI and Xbal enzymes and inserted into 
the pBAD33 expression vector (ATCC 87402) digested 
with the same enzymes. 
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E. coli regulatory assays 

K12: ArpsO E. coli cells (CGSC# 7154: strain CK1953, E. 
coli Stock Center Yale University) were co-transformed 
with an RNA and protein plasmid (made competent 
using the Z-competent buffer system, Zymo Research). 
Overnight cultures were grown +/- L-arabinose (15 mM), 
then diluted the next day to OD 60 o = 0.150 in fresh media 
(LB + 100 ug/mL AMP + 34 ug/mL CHL +/- 15 mM 
L-arabinose). At log phase, IPTG (2 mM final) was added 
to induce GFP expression and cells were grown an add- 
itional 5 hours. Cells were collected, washed with PBS, 
then stored in PBS overnight. GFP expression was mea- 
sured using a SpectraMax M5 fluorimeter (excitation: 
395 nm, emission: 508 nm, Molecular Devices). Fluores- 
cence was calculated by normalizing GFP to cell density 
(GFP/OD 600 ). 

Availability of supporting data 

The software described in Figure 2 is available under an 
MIT open license at https://github.com/jsa-aerial/gaisr. Se- 
quence alignments corresponding to the structured RNAs 
in Figure 3 are available as Additional file 1, Additional file 2, 
and Additional file 3. Nucleic acid probing data displayed 
in Figure 6 has been deposited at http://snmasm.bio.unc.edu. 
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